Many business workflows require extracting important fields from form-like documents (e.g. bank statements, bills of lading, purchase orders, etc.). Recent techniques for automating this task work well only when trained with large datasets. In this work we propose a novel data augmentation technique to improve performance when training data is scarce, e.g. 10-250 documents. Our technique, which we call FieldSwap, works by swapping out the key phrases of a source field with the key phrases of a target field to generate new synthetic examples of the target field for use in training. We demonstrate that this approach can yield 1-7 F1 point improvements in extraction performance.
translated by 谷歌翻译
End-to-end generative methods are considered a more promising solution for image restoration in physics-based vision compared with the traditional deconstructive methods based on handcrafted composition models. However, existing generative methods still have plenty of room for improvement in quantitative performance. More crucially, these methods are considered black boxes due to weak interpretability and there is rarely a theory trying to explain their mechanism and learning process. In this study, we try to re-interpret these generative methods for image restoration tasks using information theory. Different from conventional understanding, we analyzed the information flow of these methods and identified three sources of information (extracted high-level information, retained low-level information, and external information that is absent from the source inputs) are involved and optimized respectively in generating the restoration results. We further derived their learning behaviors, optimization objectives, and the corresponding information boundaries by extending the information bottleneck principle. Based on this theoretic framework, we found that many existing generative methods tend to be direct applications of the general models designed for conventional generation tasks, which may suffer from problems including over-invested abstraction processes, inherent details loss, and vanishing gradients or imbalance in training. We analyzed these issues with both intuitive and theoretical explanations and proved them with empirical evidence respectively. Ultimately, we proposed general solutions or ideas to address the above issue and validated these approaches with performance boosts on six datasets of three different image restoration tasks.
translated by 谷歌翻译
This paper focuses on analyzing and improving the commonsense ability of recent popular vision-language (VL) models. Despite the great success, we observe that existing VL-models still lack commonsense knowledge/reasoning ability (e.g., "Lemons are sour"), which is a vital component towards artificial general intelligence. Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective. Rather than collecting a new VL training dataset, we propose a more scalable strategy, i.e., "Data Augmentation with kNowledge graph linearization for CommonsensE capability" (DANCE). It can be viewed as one type of data augmentation technique, which can inject commonsense knowledge into existing VL datasets on the fly during training. More specifically, we leverage the commonsense knowledge graph (e.g., ConceptNet) and create variants of text description in VL datasets via bidirectional sub-graph sequentialization. For better commonsense evaluation, we further propose the first retrieval-based commonsense diagnostic benchmark. By conducting extensive experiments on some representative VL-models, we demonstrate that our DANCE technique is able to significantly improve the commonsense ability while maintaining the performance on vanilla retrieval tasks. The code and data are available at https://github.com/pleaseconnectwifi/DANCE
translated by 谷歌翻译
大多数现有的基于深度学习的单图像动态场景盲目脱毛(SIDSBD)方法通常设计深网络,以直接从一个输入的运动模糊图像中直接删除空间变化的运动模糊,而无需模糊的内核估计。在本文中,受投射运动路径模糊(PMPB)模型和可变形卷积的启发,我们提出了一个新颖的约束可变形的卷积网络(CDCN),以进行有效的单图像动态场景,同时实现了准确的空间变化,以及仅观察到的运动模糊图像的高质量图像恢复。在我们提出的CDCN中,我们首先构建了一种新型的多尺度多级多输入多输出(MSML-MIMO)编码器架构,以提高功能提取能力。其次,与使用多个连续帧的DLVBD方法不同,提出了一种新颖的约束可变形卷积重塑(CDCR)策略,其中首先将可变形的卷积应用于输入的单运动模糊图像的模糊特征,用于学习学习的抽样点,以学习学习的采样点每个像素的运动模糊内核类似于PMPB模型中摄像机震动的运动密度函数的估计,然后提出了一种基于PMPB的新型重塑损耗函数来限制学习的采样点收敛,这可以使得可以使得可以使其产生。学习的采样点与每个像素的相对运动轨迹匹配,并促进空间变化的运动模糊内核估计的准确性。
translated by 谷歌翻译
OD区域对之间的原点污染(OD)矩阵记录定向流数据。矩阵中复杂的时空依赖性使OD矩阵预测(ODMF)问题不仅可以棘手,而且是非平凡的。但是,大多数相关方法都是为在特定的应用程序方案中预测非常短的序列时间序列而设计的,在特定的应用程序场景中,该方法无法满足方案和预测实用应用长度的差异要求。为了解决这些问题,我们提出了一个名为Odformer的类似变压器的模型,具有两个显着特征:(i)新型的OD注意机制,该机制捕获了相同起源(目的地)之间的特殊空间依赖性,可大大提高与捕获OD区域之间空间依赖关系的2D-GCN结合后,预测交叉应用方案的模型。 (ii)一个时期的自我注意力,可以有效地预测长序列OD矩阵序列,同时适应不同情况下的周期性差异。在三个应用程序背景(即运输流量,IP骨干网络流量,人群流)中进行的慷慨实验表明,我们的方法的表现优于最新方法。
translated by 谷歌翻译
病理学家需要结合不同染色病理切片的信息,以获得准确的诊断结果。可变形图像配准是融合多模式病理切片的必要技术。本文提出了一个基于混合特征的基于特征的可变形图像登记框架,用于染色的病理样品。我们首先提取密集的特征点,并通过两个深度学习功能网络执行匹配点。然后,为了进一步减少虚假匹配,提出了一种结合隔离森林统计模型和局部仿射校正模型的异常检测方法。最后,插值方法基于上述匹配点生成用于病理图像注册的DVF。我们在非刚性组织学图像注册(ANHIR)挑战的数据集上评估了我们的方法,该挑战与IEEE ISBI 2019会议共同组织。我们的技术的表现使传统方法的平均水平注册目标误差(RTRE)达到0.0034。所提出的方法实现了最先进的性能,并在评估测试数据集时将其排名1。提出的基于特征的混合特征的注册方法可能会成为病理图像注册的可靠方法。
translated by 谷歌翻译
本文提出了一种终身学习复发性神经网络的方法,例如NNARX,ESN,LSTM和GRU,在控制系统合成中被用作植物模型。该问题很重要,因为在许多实际应用中,需要在可用的新信息和/或系统进行更改时调整模型,而无需随时存储越来越多的数据。确实,在这种情况下,出现了许多问题,例如众所周知的灾难性遗忘和容量饱和。我们提出了一种受移动范围估计器启发的适应算法,从而得出了其收敛条件。所描述的方法应用于现有文献中已经具有挑战性的基准的模拟化学厂。讨论了获得的主要结果。
translated by 谷歌翻译
您将如何通过一些错过来修复物理物体?您可能会想象它的原始形状从先前捕获的图像中,首先恢复其整体(全局)但粗大的形状,然后完善其本地细节。我们有动力模仿物理维修程序以解决点云完成。为此,我们提出了一个跨模式的形状转移双转化网络(称为CSDN),这是一种带有全循环参与图像的粗到精细范式,以完成优质的点云完成。 CSDN主要由“ Shape Fusion”和“ Dual-Refinect”模块组成,以应对跨模式挑战。第一个模块将固有的形状特性从单个图像传输,以指导点云缺失区域的几何形状生成,在其中,我们建议iPadain嵌入图像的全局特征和部分点云的完成。第二个模块通过调整生成点的位置来完善粗糙输出,其中本地改进单元通过图卷积利用了小说和输入点之间的几何关系,而全局约束单元则利用输入图像来微调生成的偏移。与大多数现有方法不同,CSDN不仅探讨了图像中的互补信息,而且还可以在整个粗到精细的完成过程中有效利用跨模式数据。实验结果表明,CSDN对十个跨模式基准的竞争对手表现出色。
translated by 谷歌翻译
在线和离线手写的中文文本识别(HTCR)已经研究了数十年。早期方法采用了基于过度裂段的策略,但遭受低速,准确性不足和角色分割注释的高成本。最近,基于连接主义者时间分类(CTC)和注意机制的无分割方法主导了HCTR的领域。但是,人们实际上是按字符读取文本的,尤其是对于中文等意识形态图。这就提出了一个问题:无细分策略真的是HCTR的最佳解决方案吗?为了探索此问题,我们提出了一种基于细分的新方法,用于识别使用简单但有效的完全卷积网络实现的手写中文文本。提出了一种新型的弱监督学习方法,以使网络仅使用笔录注释进行训练。因此,可以避免以前基于细分的方法所需的昂贵字符分割注释。由于缺乏完全卷积网络中的上下文建模,我们提出了一种上下文正则化方法,以在培训阶段将上下文信息集成到网络中,这可以进一步改善识别性能。在四个广泛使用的基准测试中进行的广泛实验,即Casia-HWDB,Casia-Olhwdb,ICDAR2013和Scut-HCCDOC,表明我们的方法在线和离线HCTR上都显着超过了现有方法,并且表现出比CTC/ CTC/ CTC/ CTC/ CTC/速度高得多的方法。基于注意力的方法。
translated by 谷歌翻译
我们提出了一种新型的动态约束不确定性加权损失,以实验处理平衡多个任务对ICML EXVO 2022挑战的贡献的问题。多任务旨在共同认识到声乐爆发中表达的情绪和人口特征。我们的策略结合了不确定性重量和平均动态重量的优势,通过用约束术语扩展权重以使学习过程更具解释。我们使用轻巧的多EXIT CNN体系结构来实施我们提出的损失方法。实验性H-均值得分(0.394)显示出比基线H均值得分的显着改善(0.335)。
translated by 谷歌翻译